Goto

Collaborating Authors

 visual word




Bag-of-Word-Groups (BoWG): A Robust and Efficient Loop Closure Detection Method Under Perceptual Aliasing

Fei, Xiang, Tian, Tina, Choset, Howie, Li, Lu

arXiv.org Artificial Intelligence

Loop closure is critical in Simultaneous Localization and Mapping (SLAM) systems to reduce accumulative drift and ensure global mapping consistency. However, conventional methods struggle in perceptually aliased environments, such as narrow pipes, due to vector quantization, feature sparsity, and repetitive textures, while existing solutions often incur high computational costs. This paper presents Bag-of-Word-Groups (BoWG), a novel loop closure detection method that achieves superior precision-recall, robustness, and computational efficiency. The core innovation lies in the introduction of word groups, which captures the spatial co-occurrence and proximity of visual words to construct an online dictionary. Additionally, drawing inspiration from probabilistic transition models, we incorporate temporal consistency directly into similarity computation with an adaptive scheme, substantially improving precision-recall performance. The method is further strengthened by a feature distribution analysis module and dedicated post-verification mechanisms. To evaluate the effectiveness of our method, we conduct experiments on both public datasets and a confined-pipe dataset we constructed. Results demonstrate that BoWG surpasses state-of-the-art methods, including both traditional and learning-based approaches, in terms of precision-recall and computational efficiency. Our approach also exhibits excellent scalability, achieving an average processing time of 16 ms per image across 17,565 images in the Bicocca25b dataset.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Originality A major contribution of the paper is posing image set summarization as a submodular optimization problem; to the knowledge of this reviewer, this is a novel view of the problem. Together with a new dataset and the adaptation of ROUGE to a significantly different application domain, this paper has several novel contributions to the state of the art. Significance According to this reviewer, this work fits well in the topics of interest for NIPS, to which it makes a significant contribution. Q2: Please summarize your review in 1-2 sentences The paper presents an elegant formulation of the problem of image collection summarization along with a new dataset and an evaluation metric.


Towards an Accurate and Effective Robot Vision (The Problem of Topological Localization for Mobile Robots)

Boros, Emanuela

arXiv.org Artificial Intelligence

Topological localization is a fundamental problem in mobile robotics, since robots must be able to determine their position in order to accomplish tasks. Visual localization and place recognition are challenging due to perceptual ambiguity, sensor noise, and illumination variations. This work addresses topological localization in an office environment using only images acquired with a perspective color camera mounted on a robot platform, without relying on temporal continuity of image sequences. We evaluate state-of-the-art visual descriptors, including Color Histograms, SIFT, ASIFT, RGB-SIFT, and Bag-of-Visual-Words approaches inspired by text retrieval. Our contributions include a systematic, quantitative comparison of these features, distance measures, and classifiers. Performance was analyzed using standard evaluation metrics and visualizations, extending previous experiments. Results demonstrate the advantages of proper configurations of appearance descriptors, similarity measures, and classifiers. The quality of these configurations was further validated in the Robot Vision task of the ImageCLEF evaluation campaign, where the system identified the most likely location of novel image sequences. Future work will explore hierarchical models, ranking methods, and feature combinations to build more robust localization systems, reducing training and runtime while avoiding the curse of dimensionality. Ultimately, this aims toward integrated, real-time localization across varied illumination and longer routes.


e4dd5528f7596dcdf871aa55cfccc53c-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all reviewers for their detailed and constructive comments. "problem [...] is relevant and important," "dataset is original," Apologies for the confusion; we will clarify. We will include results for the upper bound in Table 2 as requested by R2 . R1: Contribution of stage 2: If we remove stage 2 and zero out weights for text embedding, acc. is only 0.677. R1: "Sweet spot" for text data: We will include an experiment that trains with the first k sentences (varying k).




Hierarchy-of-Visual-Words: a Learning-based Approach for Trademark Image Retrieval

Lourenço, Vítor N., Silva, Gabriela G., Fernandes, Leandro A. F.

arXiv.org Artificial Intelligence

From the background, the procedure extracts the holes' shapes and associate them with the component shapes' list (lines 7 and 8). The foreground shapes are used in the next iterations (lines 5 and 9) until all component shapes have been extracted from the initial binary trademark image. Shape's feature extraction consists of building a feature vector for each component shape of a given trademark image (Figs. 1 (d) and (k)). These 29-dimension feature vectors combine region-based and contour-based descriptors. Shape's region is described by the 25 moments of the Zernike polynomials (ZM) of order p from 0 to 8: Z p,q= p + 1 π null ρ null θ V p,q(ρ,θ) I ( ρ,θ), (1) where ρ = null x 2 + y 2 is the length of vector from origin to pixel (x,y), θ is the angle between the vector defining ρ and the x -axis in the counter clockwise direction and V p,q(ρ,θ) is a Zernike polynomial of order p with repetition q that forms a complete set over the interior of the unit disk inscribing the component shape: V p,q( ρ,θ) = R p,q(ρ) exp ( i qθ) .


Learning Mixtures of Submodular Functions for Image Collection Summarization

Sebastian Tschiatschek, Rishabh K. Iyer, Haochen Wei, Jeff A. Bilmes

Neural Information Processing Systems

We address the problem of image collection summarization by learning mixtures of submodular functions. Submodularity is useful for this problem since it naturally represents characteristics such as fidelity and diversity, desirable for any summary. Several previously proposed image summarization scoring methodologies, in fact, instinctively arrived at submodularity. We provide classes of submodular component functions (including some which are instantiated via a deep neural network) over which mixtures may be learnt. We formulate the learning of such mixtures as a supervised problem via large-margin structured prediction.